Data Science Workbench - Building Your Model
Retailers with analytics teams and data scientists can define new algorithms and test them against the 200+ road-tested strategies that are included in the standard Recommend license.
After you've created a table to be the model behind a strategy, you can publish the strategy and use it in your site placements just like any other strategy.
Your site must be configured to use the Data Science Workbench and your FTP credentials must be created. Contact your Algonomy team if either of these is not the case.
There are size limits per strategy. A query/strategy should not exceed 1 GB. Anything that does exceed this limit will be flagged, monitored, and may be revised due to performance considerations.
Basic Workflow
- Build a table that translates your recommendation idea into an algorithm.
- Who: technical team (SQL experience highly recommended)
- Where: Personalization Dashboard > Optimization > Data Science Workbench > Data Source > Query Editor
- Create a strategy and publish it.
- Who: technical team/administrators
- Where: Personalization Dashboard > Optimization > Data Science Workbench > Strategies > New Strategy
For more information on how to create your strategy, see Creating Strategies in the Data Science Workbench.
- Use the strategy in your recommendations.
- Who: Administrators
- Where: Personalization Dashboard > Recommendations > Strategy Configuration > Production
Building a Table in the Query Editor
The first step in building your own strategy is to assemble the data you need into a table.
The query editor gives you access to your site's clickstream data, as well as any other data you've brought in through Build. You can also use Hive to build your table.
Tips
- Create your table in the work database.
- Keep in mind that strategies always use key values associated with product IDs and scores. The Strategy Structure gives you the details.
- If you are planning on using the scheduling functionality, make sure your SQL query and table exists in Hive.
- Catalog data is loaded into the Data Science Workbench once per day, based on a snapshot of the catalog at the time of the update.
- If your strategy is failing to build, verify there are no comments in the associated query, as this may be causing the query to hang, which causes the model to not build and the strategy to fail.
Strategy Structure
Custom strategies always return lists of product IDs. They can use any one of these key values:
- product category ID
- product ID
- product brand ID
- customer segment ID
- user ID
The key value is used as the seed for the strategy: what piece of information from the page should the strategy use to choose which products to recommend?
Sitewide Strategy Tables
Sitewide strategies require a table that contains a column labeled 'sitewide" and for that column to be selected as the key for the strategy. If this is not done, the sitewide strategy will not work.
Composable Strategy Tables
Composable strategies use more than one key value as the seed. For this to function properly, the table behind the strategy must have all keys within one column separated by a semicolon. The columns must also be in the following order: Key, Product ID, score.
For example, for a strategy with a user and category key, the table should look something like this:
Key | Product ID | Score |
---|---|---|
user_id;category_id | product_id | Score |
1234;p-13387 | p4595013 | 0.34 |
Definition and File Format
Custom strategies are defined by a simple table with three columns:
Field | Format | Description |
---|---|---|
Key | Alphanumeric | The input ("seed") for the strategy. The seed value is provided through instrumentation for each call that potentially uses the strategy. |
customer segment ID | A unique identifier for a customer segment, as instrumented on your site. | |
product ID | A unique identifier for the item (sometimes called the item ID). See the Relevance Cloud Developers Site for a detailed description. | |
product brand ID | A unique identifier for a brand, often instrumented in order to enable brand filtering. | |
product category ID | The category ID associated with a product or set of products, used to provide category context. | |
user ID | A user ID, generally connected to an email address, is a direct link to the login credentials a consumer uses to log into and buy products from a e-commerce site. See the Relevance Cloud Developers Site for a detailed description. | |
Product | Alphanumeric | Product ID (from your product catalog data feed). |
Score | Numeric | Scores should be sorted in descending order. |
Data Updates
Catalog Updates: Catalog data is loaded into DSW once a day, starting at roughly 2:30 AM PT each day. It takes the data per site from a hadoop extract of the catalog that is generated 3 times a day. It is not directly connected to feed processing.
DSW Data Updates: The timing varies for scheduled updates based on the query execution that runs. After the query execution runs and executes, the extract and model build is relatively quick. Generally, updates should happen not much longer after the scheduled update time.